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ABSTRACT 


The use of techniques for feature selection allows one to treat classifi- 
cation problems in spaces of reduced dimensions. This note considers a method 
of linear feature selection for n-dimensional observation vectors which be- 
long to one of two populations, where each population is described by a known 
multivariate normal density function. More specifically, we consider the problem 
of finding a l*n transformation matrix B for which the probability of mis- 
classification with respect to the one-dimensional transformed density functions 
is minimized. Theoretical results are presented which give rise to a numerically 
tractable expression for the variation in the probability of misclassification 
with respect to B. Using this expression we discuss a computational procedure 
for obtaining a B which minimizes the probability of misclassification. Pre- 
liminary numerical results are discussed. 
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PREFACE 


Multispectral Scanners have been developed to remotely 
collect data (from aircraft and spacecraft) from which earth 
resources information can be extracted. The analysis and inter- 
pretation of this data requires sophisticated mathematical 
techniques. The ability to use the data for specific applications 
depends to a large extent upon the accuracy and speed of the 
analytical techniques together with the ability to compute 
complicated mathematical expressions. 

The specific problem addressed in this report is that of 
combining a given set of spectral features, to obtain a smaller 
set of spectral features which retain "to the greatest possible 
degree" the information inherent in the original measurements. 
Combining spectral features while retaining inherent information, 
is a preclassification technique used to reduce prohibitive 
data storage and computation time requirements encountered in 
the classification technique itself. A technique is developed \ 
in this report for combining 3pectral features in such a way 
that the opti. -1 retention of information is accomplished by 
minimizing the probability of incorrectly classifying observations. 

The mathematical expression for computing the probability 
that an observation will be incorrectly classified is complicated. 
However, it is generally agreed that it is both theoretically 
and empirically the best criterion by which one can measure infor- 
mation degradation when attempting to i xabine spectral feature. 
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ON MINIMIZING THE PROBABILITY OF 


MIf CLASSIFICATION FOR LINEAR FEATURE SELECTION 


1. Introduction 

Consider two populations II ^ and 0^ with associated multivariate 

T 

i.ormal density functions defined for x = (x^, . . . .x^) e E n by 

Pi (x) = (2-)" n/2 ll i r 1/2 exp(-|(x-y i ) T Z‘ 1 (x-y i )), i = 1,2. 

If B * (b^ b^) is a nonzero l*n vector and x e E°, then Bx e 

and the populations II ^ and II 2 have transformed normal density functions 
defined for y e E^ by 

2 

-1/2 r -1/2 (y-Bbj) 

p,(y,E) = (2ir) i/Z (BZ.B L ) X/ exp( i- ) , i = 1,2. 

2BS B 1 

The linear feature selection problem considered in the sequel is to 
choose a B w..ich minimizes the probability of misclassiflcatlon of a trans- 
formed observation in E ^ using a Bayes optimal (maximum likelihood) classi- 
fication scheme. If the a priori probabilities that an observation comes from 
either II ^ or II ^ are equal, then the transformed probability of mis- 
classificatiou in e\ as a function of B, denoted by g, is given [1] by 



where 


R-^B) = {y e E 1 : p 1 (y,B) > p 2 (y,B)} 


and 


Z (B) * {y € E 1 : p^y.B) < p 2 (y,B)}. 

(If P^(y,B) = p 2 (y,B) , we define g(B) - -|) . If g is to be minimized as 
a function of B, it is natural to ask whether g is a differentiable function 
of the elements of B. If such be the case, then the minimum value of g will 
occur only if these derivatives all vanish. 

In the sequel it is shown that g is a differentiable function of the 
elements of B, and a formula for its derivatives is given. A method for 
obtaining numerically a B which minimizes g is discussed. Results obtained, 
using statistics from C-l Flight Line Data as population parameters, are pre- 
sented. 

We remark that a more general result is given in [ 2 J concerning the 
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differentiability of g when B is a k*n matrix of rank k. Unfortunately, 
when k is greater than 1, the result of [2 ] does not always guarantee 
the differentiability of g. Moreover, when g is differentiable, the formulas 
for its derivatives are not numerically tractable. 

We have found it convenient to work with the Gateaux differential cf g 
at B with increment C, denoted by <Sg(B;C), and defined (if the limit 
exists) Jjy 


«g(B;C) » n» 

S+O s 


fv.r a l*n vector C. If for a given B the above limit exists for each 

l*n v ctcr C, then g is said to be Gateaux differentiable at B [3, p. 171], 

If g is Gateaux differentiable at B, then the derivative of g with respect 
th 

to, say, the j — component of B is given by 6g(B;Cj), where is the 

lxn vector with a 1 in the j— slot and zeros elsewhere. Similarly, if B 
is a nonzero l*n vector, and C Is a lxn vector, we define 


p (y,B+sC) - p.(y,B) 

6p,(y,B;C) - lim — , i - 1,2. 

1 a+0 s 


F(y,B) 



(y,B) 

<*Tb) 

T 

B^B 1 

T 

BZjB 


(y-By^) 2 (y-By 2 ) 2 

T + t • 

2 BZ^ 1 2 BE .B 1 


We also define 



Then 


r 1 

R-^(B) = {y e E : F(y,B) & 0}, 
R 2 (B) - {y e E 1 : F(y,B) < 0} , 


and we let 


S (B) = {y e E 1 : F(y,B) - Oh 


Note that since F(y,B) is a quadratic function of y, S (B) consists of 


at most two points. 
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2. Differentiating The Probability of Misclassif ication 

In this section we show that for a nonzero lxn vector B, 6g(B;C) 
exists for each lxn vector C. We also obtain a formula for <^g(B;C) which 
is numerically tractable. 

THEOREM 1. Let B be a nonzero lxn vector. Then 6g(B;C) exists 
for each lxn vector C and is given by 


lO, if p^y.B) = P 2 (y,B) 
<Sg(B;C) = L 


If + \f 6 Pi(y. B 5 c > d y» if Pi(y. B ) * p 2 <y» B) 

\R-,(B) R~(B) 


Proof : If p^(y,B) = p 2 (y,B), then g has attained its maximum (namel” 

as a function of B. Applying the techniques developed in the remainde < ' the 
proof, one can show that g is Gateaux differentiable for this B. Therefore 
6g(B;C) ■ 0 for all C. 

If p^(y,B) t p 2 (y,B), then for small s f 0, we have 


al B4 l sC l . ^ — &W. j p 2 ( y ,B+sC)dy + J p^y ,B-*-sC)dy - j p 2 (y,B)dy - J Pj_(y, B )dy j 

R^B+sC) R 2 (B+sC) R x 

1 rp 2 (y. B+sC ) - P 2 (y» B > ! fp^y* 1 

“1/ i dy+ lJ 

* T> 


R X (B) R 2 (E) 

,B+sC) - p 1 (y,B) 

dy 


R 1 (B+sC) 


R 2 (B+sC) 


1 |*p 2 (y,s) - P x (y» B ) i^* p i fy,B ^ " P 2 ( y» B) 


i 


dy + 


R^P+sO-R^B) 


dy. 


R 1 (B)~R 1 (B+sC) 
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It is readily verified that 


m 


( i /*p 2 (y. B+sC ) - p 2 (y. B ) i f 

*J — 5 dy+ I J 


i fP]/y» B+sC ) ~ Pi(/» fi ) | 


dy 


( 


R 1 (B+sC) 


R 2 (B+sC) 


\ j 5 P 2 <y» B ; c > d y + 1 p 1 (y» fi ;c) d y. 


R X (B) 


R 2 (B) 


Then the theorem will be proved if it can be shown that 


j ' Pj/t.b) - P 2 (y. B ) 


lim 


S "^° R^(B)~R ] (B+sC) 


j P 2 (y» J 


B) - p,(y,B) 

dy = lim f — dy = 0 . 

s-*o 
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R 1 (B+sC)~R 1 (B) 


We show that the first limit exists and is equal to zero. The second limit 
is handled similarly. 

First note that S(B) is the set of solutions of the equation 


(*) F(y,&) = a (B)y 2 + 2g(B)y + y(B) = 0, 


where 


aU) - B(Z X - £ 2 )B\ 

- -B(E 1 B T BP 2 - l 2 B T Bli 1 ), 


3(B) 


and 


T 2 t 2 T T 2 

Y(B) = BE 1 B (Bu 2 ) - + (BEjB*) (Bl^log -- -- - - . 

We make the fol lowing observation : Since p^(y,B) ?p 2 (5iB), F(y ,B) must 
take on both positive and negative valves. (Otherwise, we would have either 
Pl (v,B) ^ p 2 (> ,B) or p^’.B) ^P 2 (y,B) for all y. Since both Pj(y,B) 
and p 2 (y,B) are continuous and have integral 1, either eventuality would 
imply p^CyjB) = p 2 <y f B).) From this observation, we see that if a(B) ^ 0, 
then (*) has two distinct real solutions, and if a(B) = 0, then $13) f 0 
and (*) has exactly one solution. 

If a(B) ^ 0, let y + (B) and y (B) denote the two distinct real sol- 
utions of (*) . Since F(y,B) is quadratic in y and (*) has distinct 

solutions, it follows that — F(y.(B),B) f* 0 and -r — F(y (B),B) / 0. Then, 

°V + dy - 

by the Implicit Function Theorem, there exist unicue functions y + (s) and 
y_(s), defined and continuously differentiable for small s, which sati' fy 
y + (0) = y + (B), y_(0) = y_(B), and F(y + (s), B+-C) •= F(y_(s), B+sC) = 0 for 
small s. This is to say that the points of S(B + sC) vary in a continuously 
differentiable way for small s, and it follows that, for small s, 

R^B) ~ R^(B + bC) is an interval or pair of intervals not exceeding K | s j 
in length (for an appropriate constant K independent of s) . Then, for small 
s, there exists a constant K' for which 


f p. (y,B) - p,(y,B) 

max 


J • 

5 K y e R. (B)~R. (B+sC) 

P 1 (y,B) - P 2 (y,fl) 

R 1 (B)~R 1 (B+sC) 
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Since R^(B) R^(B+sC) is composed of intervals of length not exceeding 
k[ s I and having either y + (B) or y (B) as an endpoint, and since 
Pi (y + (B),B) - P 2 (y + (B),B) =■ Pl (y_(B),B) - p 2 (y_(B),B) * 0, we have 


lin 

s*o 




b) - p 2 (y,B) 


= 0 


R 1 (B)~R 1 (B+sC) 


as desired. 

Bu^ + Bp 2 

If a (B) = 0, then (*) has exactly one solution y (B) = 2 — . 

Since a (B+sC) is a polynomial in s, either * (B+sc) E 0 for all s or 

a (B+sC) ^ 0 in some deleted neighborhood of s * 0. If u (B+sC) = 0 for all s, then 
(B+sC)p. + (B+sC)y„ 

y (s) = ^ — — is the unique continuously differentiable function 

of s satisfying y(0) * y(B) and F(y(s), B+sC) =* 0. Reasoning as in the 
paragraph above, we obtain 


lim 

s-K) 



(y,B) 


- P 2 (y, B ) 

s 


R^BV-R^B+sC) 


= 0. 


If a(B + sC) 4 0 in some deleted neighborhood of s - 0, then, for small 
s ^ 0 , the roots of (*) are 


, ^ -S(B+sC) t \J 8(B+sC) 2 - ct(B+sC)y(B+sC) 
y ±^ s; “ a(B + sC) 



■V ' 


Without loss of generality, we assume that -8(B-*-sC) > 0 for small s. Then, 
for small s 4 0, we have that |y + (s)| ^ for an appropriate constant K 

|s| 

independent of s. Furthermore, we have that 


, v -B(B+sC) -V$(B+sC) - a (B+sC) y(B+sC) 

y _ (s) = ' a (B + sC) 


Y (B+sC) 

2 ^8 (bP 


+ 0(s) 


By 2 + 3y x 


+ 0(s) = y(B) + 0(s) 


for small s. Consequently, we can find a constant K' for which 


f PiCy.B) - p 7 Cy,B) 

max 


J s 

< K' 

|y-y(B)| 5 K'|s| 

Pj^Cy.B) - P 2 (y,B) | 


R 1 (B)~R 1 (B+sC) 


+ Tir /, i p i (y * B) " 


p 2 (y,B)| 


I V > 


for small s ^ 0. Since p 1 (y(B),B) - r 2 (y(B),B) = 0 and Ip^y.B) - p 2 (y,B) 
approaches zero exponentially as |y| becomes large, it follows that 


/!& 


B) - p 2 (y,B) 


R 1 (B)~R 1 (B+sC) 


This completes the proof of Theorem 1. 





10 




<fe 


According to Theorem 1, for nonzero B the Gateaux differential 
5g(3;C) exists and can be calculated using the Gateaux differentials 
6p, (y,B;C) and Sp^Cy.BjC) of the transformed density functions. In the 
lemma below, we calculate fT ^(y»B;C). For convenience, we omit subscripts. 

Lemma. Let B be a nonzero l*n vector. Then 


5p(y,B;C) 


-P(y,B) 



Cy 

— =T- (y-Bu 
BEB 


T 

CEB 

T 2 
(BEB 1 ) 


(y-By 



for each l*n vector C. 
Proof : We have 


p(y,B) - (2TI)" 1 2 f o (B)‘ 1/2 exp(- j f-^B)), 

2 

where f (B) = BlB T and f 1 (B) • ^ — . It is easily verified that 

° 1 B E B T 

6P(y,B;C) - (2 tt)‘ 1/2 {- f o (B)“ 3/2 6f o (B;C)exp(- -^(B)) - 

- ^f o (B)“ 1/2 exp(- 

= - •|p(y,B){f o (B)~ 1 6f o (B;C) + 6f 1 (B;C)} 
and that 6f Q (B;C) - 2CEB T . Now f^B) - 


> 


and one sees without 
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iifficulty that 


SfjttjC) 


-2Cy 

f o (B) 


(y-By) 


(y - By ) 2 

f o (B) 2 


6f (B;C) 
o 


= —~p(y-By) 

BZB 1 


2CIB T 

(bib t ) 2 


(y - By) 2 . 


After a brief calculation, the lemma follows. 

Theorem 1 and the above lemma provide an explicit formula for 5g(B;C). 
Unfortunately, this formula does not lend itself to easy computation because of 
the integrals that appear. Remarkably enough, a short calculation yields the 
formula of Theorem 2 below, in which no integrals appear. In order to summarize 
the results of this section, we incorporate some of the statement of Theorem 1 
in the s‘ cement of Theorem 2. We also recall that, if p^(y,B) ? P 9 (y,B), 
tl.en either S(B) = {a} or S(B) = {a ,a,} for some a, a ,a, in E 1 (with 

• T “ T 

i_ < a^_). In the statement of the theorem, we use the notation 


f(y) 


S(B) 


f (a) if S(B) » (a) 

f(a + ) - f (a_) if S (B) - {a + ,a 


f )r a function f(y) on E . 

Theorem z: Let B be a non-zero l*n vector. Then 6g(B;C) exists for each 


vector C 


an& is given by 
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/ o if P 1 (y,B) E P 2 (y,B) 


ci 2 b t cz b t 

1 + — -r<y-^ 2 > - — =-f (y-B^) i | S(E) 

4g(B;C> - ^ BI 2 B BE 1 B 1 

if P 1 (y,B) f P 2 (y,B), where the sign is taken to be 

p (y.B) 

^2 lslgn 108 


Proof : We already know from Theorem 1 that, for nonzero B, 6g(B;C) exists 

for all C and 6g(B;C) ■= 0 if P-^y.B) = P 2 (v,B). If P-^y.B) t P 2 (y,B), 
then, according to Theorem 1, 


6g(B;C) 


j 6P 2 (y,B;C) + j 


^(B) 


C) + / SP-^y.BjC). 


R 2 (B) 


It must be shown that this expression is the same as the corresponding expression 
in the statement of Theorem 2. We show below that 



(y,B;C) 


R X (B) 


T 

ce b a 

ip^y.BHcn, + =r-(y-Bp )] 

1 1 BE^ 1 


S(B) 


P-^y.B) 

where the sign is taken to be lim [sign log — — r ] . 

y-** 1 P 2 (y,B) J 

assume that the equation F(y,B) * 0 has only one root, 


For simplicity, we 
i.e., S (B) = {a} 



for some a £ E , and that R^'.B) = (-°°,a]. The remaining cases and expressions 
are dealt with similarly. 

From the preceding lemma, we obtain 


j 5P 2 (y,B; 


C) dy = - 


R^B) 


cz 2 b 

bi 2 b^ 


CZ 2 B j 


(be 2 b 


a 

r 2<y - 

— oo 

a 

9 r y - 


Cy 2 

Bl^B 1 


a 

b 


Bp 2^ p 2^ y>B ^ dy 


.2. 


bu 2 ) p 2 (y» B )dy- 


Integrating by parts, we obtain 


CZ„B 


T a 


Cl * 


2 j (y-®l J 2 )2p 2 (y * B)dy = ■ - /"f (y-By 2 )p 2 (y,B) | + 


(BI 2 B 


BE 2 B j 


y**a 


+ — ^ I p 9 (y,B)dy 


bl 2 b 


/■ 


T J ^2 


and 


Cy, 


bz 2 b 


a 

b 


— I (y-By 2 )p 2 (y,B)dy - Cy^> 2 (y,B) 


y-a. 


Since p 2 (a,B) =« 


p^(a,B), substitution gives the desired result 
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J Sp 2 (y,B;C)dy 

R 1 (B) 


-p (y,B)tCy 9 + ■ 2 - T (y-By )] 

bz 2 b a 


S(B)’ 


and the proof of Theorem 2 is complete. 
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3. Computational Procedure 

In this section we present a method for computing a l*n vector B which 
minimizes the probability of misclassif ication g. It is well-known [3, p. 178] 
that when B is an extremum of g, then Sg(B;C) = 0 for each l*n vector 
C. It follows that if B minimizes g, then B must satisfy the vector 
equation 

3B 

where , 1 £ j £ n, is a l*n vector with a one in the j— slot and 

zeros elsewhere. Our method consists of using a numerically tractable formula 
for which we obtain from Theorem 2, in a Davidon-Fletcher-Powell iterative 

procedure, SUBROUTINE DAVIDN, for finding a local minimum of g. 

Assuming that both the n*l mean vectors and \i^ and the n*n 

covariance matrices and are known, we describe below the way in which 

the necessary functions are computed for SUBROUTINE DAVIDN. 

To compute the error function 

<t>(a) - (2 tt)' 1//2 j* exp(- t 2 )dt, 

mjQQ 

a double precision function, DPHI, is used. 


6g(B;C 1 ) 


/ o\ 


\ 


5g(B;C ) 
n 


w 
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For a given l'n vector B, let 


D i (a,B) = f p (y,B)dy, i = 1,2, < a 


< oo 


In computing the values of D^(a,B), one uses the relationship 


a-By, 


D ‘ <a ’ B) = * 


, i = 1,2. 


T T 

After computing the scalars By^, By^, BI^B , and Bl^B , we solve the 
quadratic equation 


a(B)y^ + 23(B)y + y(B) * 0, 


where 


a (B) = BZ^ 1 - BI 2 B T 


6(B) = (BZ 2 B T )By 1 - (BZ 1 B T )By 2 , 


and 


BZ b a 

y(B) = (Bl^XBy^ 2 - (BZ^XBy^^ (BZ^ 1 ) (BZ 2 B T ) log ■ * 

bz ^b 


As noted in the proof of Theorem 1, this quadratic has either a single root 
or else two distinct real roots. We treat these cases separately. 
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Single Root Case 

The quadratic equation has a single root precisely when Ot(B) = 0, that 
is, when the transformed covariances are equal. If a denotes the single 
root, then it is easily verified that 


a = 


+ Bu 2 


g(B) 


J - ■|[D 1 (a,B) - D 2 (a,B)], if Bv^ < a 


\j + l [D l (a ’ B) “ r (a » B)] > if Bvj 2 < a » 


and 


3B 


(I 1 + I )B T 

y, - y, = t ■ bv?)* lf By, < a 

z bz^b 1 + bz ^ 1 1 z 1 



(S-L + £ 2 )B T 

T "" T 

BZ^B 1 + BZ 2 B 


(BPj^ - By 2 ), 


if Bu 2 < a. 


Two Root Case 


Let a^, a 2 denote the two distinct roots of the quadratic equation 
arranged so that a^ < a 2 « Then one can easily verify that 


8(B) 


2 - K, if R X (B) - [a r a 2 ] 


2 + K, if R 2 (B) - (a r a 2 ) 




lfi 


where 


K = |{[D 1 (a 2 ,B) - D 1 (a 1 ,B)J - [D^.B) - D^a^B]}, 


and 


T T 

z,b a e„b x 


hh - V2 + k 3 — - T - k 4 rr~T' i£ R 1<« - 'v a 2i 


BZjB BE 2 B 


i 2 b t z b t 

K 2 P 2 “ K 1 U 1 + K 4 “ T ~ K 3 _“_T» if R 2 (B) = ^ a i> a 2 ) 


where 

K 1 = p i^ a 2‘ B ^ “ Pi( a i» B ) 

K 2 = p 2 < ‘ a 2’ B) ~ p 2 (a l ,B) 

K 3 = ^ a 2 ~ Bv 1 )p 1 (a 2 ,B) - (a^ - By^p^a^B) 
K 4 " ^ a 2 " By 2^ p 2^ a 2* B ^ “ ^ a l ” By 2^ p 2* a l ,B ^‘ 


1 


i 

I 


•- 1 
1 


It should be noted that for the case of equal n><n covariance matrices, 
we always have the single root case, and one can verify that 

B ■= (h x - y 2 ) T (E x + e 2 ) _1 

OR 

satisfies « 0, T is suggests that one should start the iterative procedure 
using this B as an initial guess, even if E^ j Z 2 » 


i 
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A flow chart for the preliminary version of our procedure is presented 
in the remainder of this section. 


MAIN-1 



Yes 


Read : 

Class means into 
XMEAN (I,J) 

.ass covariance matrices into 
CO VAR (I , J ,K) 


0 





MAlN-z 





SHUTLE-1 
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Compute: 

a(B), /3(B), y(B) 
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4. Preliminary Numerical Results 

In this section we present some preliminary results of the computational 
procedure discussed in the previous section. Mean vectors and covariance 
matrices for several pairs of classes from C-l Flight Line Data were used 
as population parameters. These parameters are given on pages 29 - 33 . 

In each of the Cases 1-4, presented in the following pages, the formula 

(**) B o = + Ej)" 1 

is used to compute the initial vector, B q , for starting the iterative pro- 
cedure, The final vector B, which minimises the probability of misclassifi- 
catlon, was determined using the computational procedure discussed in the 
previous section. The values of the transformed means, transformed covariances, 
and probability of mis classification for B q and B are given in each case. 

We have also given the number of iterations (computations of g) needed to 
determine B for each of the cases. 



\ 



Population Parameters 


Class 1 


Class 2 





4 


Initial vector. B , 
o 

Final nctor, B, 

computed using (**) 

which minimises g 

.152381 

.149740 

.103120 

.101319 

-.210958 

-.207372 

-.271028 

-.266399 

-.049500 

-.047121 

.485537 

.477211 

.375000 

.368575 

.003148 

.003040 

-.203273 

-.199847 

-.673343 

-.661844 

.054181 

.052954 

.082849 

g(B o ) - .017290986 

.081234 

g(B) - .017290920 


Number of iterations 

B LB 1 - 4. 517525 
o 1 o 

BEjB T - 4.361629 

B I,B T • 4 . 413498 
o 2 o 

BE 2 B T - 4. 257698 

B y, —34 . 301838 

0 X 

BVj —33.855418 

Bu, —43 , 232862 

O X 

B y 2 —42.629203 
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Case 2 


y l 

Population Parameters: - Class 3 ; 

E i 

*2 

- Class * 

E 2 

Initial vector, B , 
o 

Final vector, B, 

computed using (**) 

which minimizes g 

-.213967 

-.196691 

-.469183 

-.438855 

.177938 

.179309 

-.001497 

-.018567 

.188612 

.140462 

-.103755 

-.274768 

-.097442 

-.156862 

.009932 

.081037 

.601157 

.729173 

-.086085 

.009754 

-.084609 

-.145805 

-.145998 

g(B Q )'- .003018366 

-.238270 

g(B) - .002715326 


Number of iterations 

B LB 1 » 10. 706851 
o 1 o 

BZjB 1 - 25.185144 

B LB 1 « 2. 607698 

o i o 

BI 2 B T - 45.544406 

B u, --35. 144301 
o 1 

Bji 1 —48.456476 

B o y 2 "” 48 * 458851 

B» 2 --68. 128863 
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Case 3 


U 1 | 

Population Parameter : j * Class 5 

Z 1 ) 


Class 6 


Initial vector, B q , Final vector, B, 

computed using (**) which minimizes g 


.808819 

.577773 

.231155 

.167764 

-.290136 

-.205380 

.193884 

.140068 

.118042 

.088422 

.755009 

.540627 

.164943 

.118282 

-.362365 

-.253048 

-.447546 

-.314009 

.440773 

-.310622 

.014834 

.011237 

-.083220 

. -.053831 

» o ) - .024515472 

g(B) - .024407769 
Number of iterations 

B LB 1 • 4.523550 
O X 0 

BI 1 B T - 2 .443918 

B LB T - 3,145724 

O i. 0 

■LB 1 - 1 .586862 

B M, -U1.417584 
0 1 

»y - 85 .707183 

B Mo -103.748308 
O 2 

Bm 2 - 60 ,153584 
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Case 4 


Population Parameters: 


Class 7 


Class ^ 




3 




1$ 



Initial vector, B , 

* o’ 

Final vector, B, 

computed using (**) 

which minimizes g 

2.243915 

.945839 

.308322 

1.222243 

-.500153 

-1.406996 

Same as B 

-1.643716 

-1.009229 

.829381 

-.490735 

-.610768 

-1.316042 

g(B Q ) - .20299 x 10' 11 

o 

g(B) - Same as g(B o ) 

B I.B T « 5^ 866767 
o 1 o 

Number of iterations: 
BZjB 1 - 

B LB T - 3$ 391677 
o l o 

bi 2 b t - . 

B q U 1 »-176. 557149 

Same as for 

By x - 

B o u 2 --273, 815594 

bw 2 - 
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5, Concluding Remarks 

Although the than y and subsequent computational procedure presented 
herein yield encouraging numerical results, there is still much work to be 
done. Extensive testing of the existing program is needed. Other optimization 
techniques should be tested as substitutes for the Davidon-Fletcher Powell 
routine. Theoretical insults are needed which justify the use of the for- 
mula (**) of Section 4 for computing starting vectors. 

The theory for the two population case can be extended to the case of 
u populations. The associated computational procedure is in developmental 
s tage . 

Finally, we note the possibility of developing a suboptimal method, 
using the computational procedure discussed herein, which determines a k*n 
matrix, B, one row at a time so as to tuin it j. . u - probability of mis- 
classificatj.on in k-dimensional space. Such a procedure ha* not been developed, 
even in the case of two populations. 
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